Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis

نویسندگان

  • V. N. Manjunath Aradhya
  • G. Hemantha Kumar
  • S. Noushath
چکیده

Character recognition lies at the core of the discipline of pattern recognition where the aim is to represent a sequence of characters taken from an alphabet [Kasturi, R., Gorman, L.O., Govindaraju, V., 2002. Document image analysis: a primer. Sadhana 27 (Part 1), 3–22]. Though many kinds of features have been developed and their test performances on standard database have been reported, there is still room to improve the recognition rate by developing improved features. In this paper, we present a multilingual character recognition system for printed South Indian scripts (Kannada, Telugu, Tamil and Malayalam) and English documents. South Indian languages are most popular languages in India and around the world. The proposed multilingual character recognition is based on Fourier transform and principal component analysis (PCA), which are two commonly used techniques of image processing and recognition. PCA and Fourier transforms are classical feature extraction and data representation techniques widely used in the area of pattern recognition and computer vision. Our experimental results show the good performance over the data sets considered. r 2007 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multiple Feature based Novel Approach for Identification of Printed Indian Scripts at Word Level

In a country like India where different scripts are in use, automatic identification of printed script facilitates many important applications such as automatic transcription of multilingual documents and for the selection of script specific OCR in a multilingual environment. In this paper a novel method to identify the script type of the collection of documents printed in seven Indian language...

متن کامل

Global Approach for Script Identification using Wavelet Packet Based Features

In a multi script environment, an archive of documents having the text regions printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this paper, a novel texture-based approach is presented to identify the script type of the collection of documen...

متن کامل

A Script Recognizer Independent Bi-lingual Character Recognition System for Printed English and Kannada Documents

Department of Computer Science Amrita Vishwa Vidyapeetham, Mysore Campus Bogadi, Mysore INDIA _____________________________________________________________________________________ Abstract: Recognition of text document images is the inclination of any optical character recognition systems. This paper aims at extending the functionality of optical character recognition system to recognize more t...

متن کامل

A Comparative Analysis of Classifiers Accuracies for Bilingual Printed Documents (Oriya-English)

Bilingual document recognition has been the subject of intensive research and our focus is on the recognition of an Oriya-English bilingual documents. In most of our official papers, school text books, it is observed that English words interspersed within the Indian languages. So there is need for an Optical Character Recognition (OCR) system which can recognize these bilingual documents and st...

متن کامل

Script Identification from Bilingual Gujarati-English Documents

In a multi-lingual country like India, in most of the official papers, school text books, magazines, it is observed that English words intersperse within the Indian regional languages. So a bilingual Optical Character Recognition (OCR) system is needed which can recognize these bilingual documents and store it for future use. In this paper authors present an OCR system developed for the script ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Eng. Appl. of AI

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2008